PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators

نویسندگان

  • Robert Schreiber
  • Shail Aditya
  • Scott A. Mahlke
  • Vinod Kathail
  • B. Ramakrishna Rau
  • Darren C. Cronquist
  • Mukund Sivaraman
چکیده

The PICO-NPA system automatically synthesizes nonprogrammable accelerators (NPAs) to be used as co-processors for functions expressed as loop nests in C. The NPAs it generates consist of a synchronous array of one or more customized processor datapaths, their controller, local memory, and interfaces. The user, or a design space exploration tool that is a part of the full PICO system, identifies within the application a loop nest to be implemented as an NPA, and indicates the performance required of the NPA by specifying the number of processors and the number of machine cycles that each processor uses per iteration of the inner loop. PICO-NPA emits synthesizable HDL that defines the accelerator at the register transfer level (RTL). The system also modifies the user's application software to make use of the generated accelerator. The main objective of PICO-NPA is to reduce design cost and time, without significantly reducing design quality. Design of an NPA and its support software typically requires one or two weeks using PICO-NPA which is a many-fold improvement over the industry norm. In addition, PICO-NPA can readily generate a wide-range of implementations with scalable performance from a single specification. In experimental comparison of NPAs of equivalent throughput, PICO-NPA designs are slightly more costly than hand-designed accelerators. Logic synthesis and place-and-route have been performed successfully on PICO-NPA designs, which have achieved high clock rates.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Level Synthesis of Nonprogrammable Hardware Accelerators

The PICO-N system automatically synthesizes embedded nonprogrammable accelerators to be used as co-processors for functions expressed as loop nests in C. The output is synthesizable VHDL that defines the accelerator at the register transfer level (RTL). The system generates a synchronous array of customized VLIW (very-long instruction word) processors, their controller, local memory, and interf...

متن کامل

Designing Scalable Wireless Application-Specific Accelerators Using PICO High Level Synthesis

This paper presents a system level methodology of designing and exploring scalable and flexible wireless application-specific accelerators. Current hardware designs and implementations for wireless systems have a huge time gap between the development of algorithms for new standards and their hardware implementation. Hardware design using traditional HDL flows has such a long design time that by...

متن کامل

Bitwidth cognizant architecture synthesis of custom hardwareaccelerators

application-speci c design, architecture synthesis, bitwidth, clustering, embedded system, hardware accelerator, operation scheduling, resource allocation PICO is a system for automatically synthesizing embedded hardware accelerators from loop nests speci ed in the C programming language. A key issue confronted when designing such accelerators is the optimization of hardware by exploiting infor...

متن کامل

Embedded Computing: New Directions in Architecture and Automation

embedded computing, special-purpose architectures, customization, custom architectures, off-theshelf customizable systems, FPGA, automation, architecture synthesis, hardwaresoftware co-design, processor-compiler codesign, frameworks, constructors, constructors, design space exploration, PICO, system synthesis, VLIW synthesis, nonprogrammable accelerator synthesis, cache hierarchy synthesis With...

متن کامل

PARO: A Design Tool for Synthesis of Hardware Accelerators for SoCs

It is a known fact that 90% of the execution time of high performance applications are spent in nested loop programs which offer a tremendous potential of acceleration due to inherent parallelism. Furthermore, streaming applications consisting of multiple communicating loops from fields of signal processing, medical imaging, financial computing require high performance computing. The FPGAs offe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • VLSI Signal Processing

دوره 31  شماره 

صفحات  -

تاریخ انتشار 2002